Comments about the article in Nature: ChatGPT generates fake data set to support scientific hypothesis

Following is a discussion about this article in Nature Vol 623 30 November 2023, by Miryam Naddaf
To study the full text select this link: https://www.nature.com/articles/d41586-023-03635-w In the last paragraph I explain my own opinion.

Reflection


Introduction

Researchers have used the technology behind the artificial intelligence (AI) chatbot ChatGPT to create a fake clinical-trial data set to support an unverified scientific claim.
If you want to identify who is quilty then these researchers are quilty. It is never the AI program.
In a paper published in JAMA Ophthalmology on 9 November, the authors used GPT-4 — the latest version of the large language model on which ChatGPT runs — paired with Advanced Data Analysis (ADA), a model that incorporates the programming language Python and can perform statistical analysis and create data visualizations. The AI-generated data compared the outcomes of two surgical procedures and indicated — wrongly — that one treatment is better than the other.
The programme written in Python is the most important part to create the false measurement data.
“Our aim was to highlight that, in a few minutes, you can create a data set that is not supported by real original data, and it is also opposite or in the other direction compared to the evidence that are available,” says study co-author Giuseppe Giannaccare, an eye surgeon at the University of Cagliari in Italy.
It is a very simple task to modify the results of any experiment from honest data to false data.
The ability of AI to fabricate convincing data adds to concern among researchers and journal editors about research integrity.
In fact that is very easy to perform, but very difficult to detect.
“It was one thing that generative AI could be used to generate texts that would not be detectable using plagiarism software, but the capacity to create fake but realistic data sets is a next level of worry,” says Elisabeth Bik, a microbiologist and independent research-integrity consultant in San Francisco, California
To detect false play is the always difficult part.

1. Surgery comparison

The authors instructed the large language model to fabricate data to support the conclusion that DALK results in better outcomes than PK. To do that, they asked it to show a statistical difference in an imaging test that assesses the cornea’s shape and detects irregularities, as well as a difference in how well the trial participants could see before and after the procedures.
In short they manipulated the result of an experiment. That is easy to perform by a human.


Reflection 1


Reflection 2


If you want to give a comment you can use the following form Comment form


Created: 20 December 2022

Back to my home page Index
Back to Nature comments Nature Index